【week15】820. Short Encoding of Words 解题报告

12月 11 2018 algorithm 3 分钟读完 (约 517 字)

题目

Given a list of words, we may encode it by writing a reference string S and a list of indexes A.

For example, if the list of words is ["time", "me", "bell"], we can write it as S = "time#bell#" and indexes = [0, 2, 5].

Then for each index, we will recover the word by reading from the reference string from that index until we reach a "#" character.

What is the length of the shortest reference string S possible that encodes the given words?

Example:

1
2
3

Input: words = ["time", "me", "bell"]
Output: 10
Explanation: S = "time#bell#" and indexes = [0, 2, 5].

Note:

1 <= words.length <= 2000.
1 <= words[i].length <= 7.
Each word has only lowercase letters.

解题报告

题意不太简单，绕了一下。

很容易想到，如果一个单词 $words[i]$ 可以被 $words[j]$ 代替，那么它必然是 $words[j]$ 的子串。而且是从尾部开始的子串。

问题是怎么检测出两个单词间的关系，暴力一点可以直接两层循环 $O(N^2)$ ，但是这样太慢了，需要找一种方法快速剔除子串。这样我想到了哈希，首先可以把所有单词放进一个 unordered_set 里，然后把剔除每个 $word$ 的每个子串，这看起来很麻烦，但其实并不会花费很多时间，因为 unorder_set 的操作是 $O(1)$ 的，而每个单词的长度不超过 7，几乎可以忽略不计，所以基本上整个剔除操作几乎是 $O(N)$ 的。而后面的计算答案就不用说了，时间代价也是 $O(N)$，所以总的时间代价是 $O(N)$ 的。

当然，这么做也不是最好的，还有一种方法是建立 Trie 树，但是这个我不会。。看评论区还有一种巧妙的算法是反转+排序：反转每个字符串，然后排序，这样只需检查 $sorted[i]$ 和 $sorted[i+1]$ 的关系即可，感觉也很独特。

Solution

class Solution {
public:
    int minimumLengthEncoding(vector<string>& words) {
        unordered_set<string> us(words.begin(), words.end());
        for (auto w: words)
            for (int i = 1; i < w.size(); ++i)
                us.erase(w.substr(i));
        int ret = 0;
        for (auto s: us)
            ret += s.size() + 1;
        return ret;
    }
};

#homework #algorithm

【week15】820. Short Encoding of Words 解题报告

题目

解题报告

Solution

评论

Your browser is out-of-date!